In this project we are going through the covid-19 data from the John Hopkins University to build a full world status report. This project is divided in 3 parts:
We start our project by loading the need packages and the John Hopkins University data.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from fpdf import FPDF
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
import pycountry
confirmed_link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
death_link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
recovered_link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'
confirmed_df = pd.read_csv(confirmed_link)
confirmed_df.tail()
| Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 2/28/23 | 3/1/23 | 3/2/23 | 3/3/23 | 3/4/23 | 3/5/23 | 3/6/23 | 3/7/23 | 3/8/23 | 3/9/23 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 284 | NaN | West Bank and Gaza | 31.952200 | 35.233200 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 | 703228 |
| 285 | NaN | Winter Olympics 2022 | 39.904200 | 116.407400 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 | 535 |
| 286 | NaN | Yemen | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 | 11945 |
| 287 | NaN | Zambia | -13.133897 | 27.849332 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 343012 | 343012 | 343079 | 343079 | 343079 | 343135 | 343135 | 343135 | 343135 | 343135 |
| 288 | NaN | Zimbabwe | -19.015438 | 29.154857 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 263921 | 264127 | 264127 | 264127 | 264127 | 264127 | 264127 | 264127 | 264276 | 264276 |
5 rows × 1147 columns
death_df = pd.read_csv(death_link)
death_df.tail()
| Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 2/28/23 | 3/1/23 | 3/2/23 | 3/3/23 | 3/4/23 | 3/5/23 | 3/6/23 | 3/7/23 | 3/8/23 | 3/9/23 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 284 | NaN | West Bank and Gaza | 31.952200 | 35.233200 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 5708 | 5708 | 5708 | 5708 | 5708 | 5708 | 5708 | 5708 | 5708 | 5708 |
| 285 | NaN | Winter Olympics 2022 | 39.904200 | 116.407400 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 286 | NaN | Yemen | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 2159 | 2159 | 2159 | 2159 | 2159 | 2159 | 2159 | 2159 | 2159 | 2159 |
| 287 | NaN | Zambia | -13.133897 | 27.849332 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 4057 | 4057 | 4057 | 4057 | 4057 | 4057 | 4057 | 4057 | 4057 | 4057 |
| 288 | NaN | Zimbabwe | -19.015438 | 29.154857 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 5663 | 5668 | 5668 | 5668 | 5668 | 5668 | 5668 | 5668 | 5671 | 5671 |
5 rows × 1147 columns
recovered_df = pd.read_csv(recovered_link)
recovered_df.tail()
| Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 2/28/23 | 3/1/23 | 3/2/23 | 3/3/23 | 3/4/23 | 3/5/23 | 3/6/23 | 3/7/23 | 3/8/23 | 3/9/23 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 269 | NaN | West Bank and Gaza | 31.952200 | 35.233200 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 270 | NaN | Winter Olympics 2022 | 39.904200 | 116.407400 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 271 | NaN | Yemen | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 272 | NaN | Zambia | -13.133897 | 27.849332 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 273 | NaN | Zimbabwe | -19.015438 | 29.154857 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 1147 columns
We have 3 different dataframes containing information about the confirmed, the death and the recovered cases. The data format is such that there is a row per country and a column per date. There are 299 rows in the confirmed and death dataframes and 274 in the recovered dataframe, which means that we don't have the information for the recovered cases for all countries.
Let us reshape the data into a more suitable format, so that we can have a column with the dates and a column with the cases.
confirmed_df = pd.melt(confirmed_df, id_vars=confirmed_df.columns[0:4], value_vars=confirmed_df.columns[4:],
var_name='Date', value_name='Cases')
confirmed_df.tail()
| Province/State | Country/Region | Lat | Long | Date | Cases | |
|---|---|---|---|---|---|---|
| 330322 | NaN | West Bank and Gaza | 31.952200 | 35.233200 | 3/9/23 | 703228 |
| 330323 | NaN | Winter Olympics 2022 | 39.904200 | 116.407400 | 3/9/23 | 535 |
| 330324 | NaN | Yemen | 15.552727 | 48.516388 | 3/9/23 | 11945 |
| 330325 | NaN | Zambia | -13.133897 | 27.849332 | 3/9/23 | 343135 |
| 330326 | NaN | Zimbabwe | -19.015438 | 29.154857 | 3/9/23 | 264276 |
death_df = pd.melt(death_df, id_vars=death_df.columns[0:4], value_vars=death_df.columns[4:],
var_name='Date', value_name='Cases')
death_df.tail()
| Province/State | Country/Region | Lat | Long | Date | Cases | |
|---|---|---|---|---|---|---|
| 330322 | NaN | West Bank and Gaza | 31.952200 | 35.233200 | 3/9/23 | 5708 |
| 330323 | NaN | Winter Olympics 2022 | 39.904200 | 116.407400 | 3/9/23 | 0 |
| 330324 | NaN | Yemen | 15.552727 | 48.516388 | 3/9/23 | 2159 |
| 330325 | NaN | Zambia | -13.133897 | 27.849332 | 3/9/23 | 4057 |
| 330326 | NaN | Zimbabwe | -19.015438 | 29.154857 | 3/9/23 | 5671 |
recovered_df = pd.melt(recovered_df, id_vars=recovered_df.columns[0:4], value_vars=recovered_df.columns[4:],
var_name='Date', value_name='Cases')
recovered_df.tail()
| Province/State | Country/Region | Lat | Long | Date | Cases | |
|---|---|---|---|---|---|---|
| 313177 | NaN | West Bank and Gaza | 31.952200 | 35.233200 | 3/9/23 | 0 |
| 313178 | NaN | Winter Olympics 2022 | 39.904200 | 116.407400 | 3/9/23 | 0 |
| 313179 | NaN | Yemen | 15.552727 | 48.516388 | 3/9/23 | 0 |
| 313180 | NaN | Zambia | -13.133897 | 27.849332 | 3/9/23 | 0 |
| 313181 | NaN | Zimbabwe | -19.015438 | 29.154857 | 3/9/23 | 0 |
Now let us check for any missing values.
confirmed_df.isna().sum()
Province/State 226314 Country/Region 0 Lat 2286 Long 2286 Date 0 Cases 0 dtype: int64
death_df.isna().sum()
Province/State 226314 Country/Region 0 Lat 2286 Long 2286 Date 0 Cases 0 dtype: int64
recovered_df.isna().sum()
Province/State 227457 Country/Region 0 Lat 1143 Long 1143 Date 0 Cases 0 dtype: int64
Most of the Province/State values are missing, therefore we are going to remove this column and group the Cases by the Country/Region column. We are also removing the Lat and Long rows, since we will be using the pycountry package to plot some maps visualizations.
confirmed_df = confirmed_df.iloc[:, [1, 4, 5]]
death_df = death_df.iloc[:, [1, 4, 5]]
recovered_df = recovered_df.iloc[:, [1, 4, 5]]
confirmed_df['Cases'] = confirmed_df.groupby(['Country/Region', 'Date'])['Cases'].transform('sum')
confirmed_df = confirmed_df.drop_duplicates(subset=['Country/Region', 'Date', 'Cases']).reset_index(drop=True)
confirmed_df.tail()
| Country/Region | Date | Cases | |
|---|---|---|---|
| 229738 | West Bank and Gaza | 3/9/23 | 703228 |
| 229739 | Winter Olympics 2022 | 3/9/23 | 535 |
| 229740 | Yemen | 3/9/23 | 11945 |
| 229741 | Zambia | 3/9/23 | 343135 |
| 229742 | Zimbabwe | 3/9/23 | 264276 |
death_df['Cases'] = death_df.groupby(['Country/Region', 'Date'])['Cases'].transform('sum')
death_df = death_df.drop_duplicates(subset=['Country/Region', 'Date', 'Cases']).reset_index(drop=True)
death_df.tail()
| Country/Region | Date | Cases | |
|---|---|---|---|
| 229738 | West Bank and Gaza | 3/9/23 | 5708 |
| 229739 | Winter Olympics 2022 | 3/9/23 | 0 |
| 229740 | Yemen | 3/9/23 | 2159 |
| 229741 | Zambia | 3/9/23 | 4057 |
| 229742 | Zimbabwe | 3/9/23 | 5671 |
recovered_df['Cases'] = recovered_df.groupby(['Country/Region', 'Date'])['Cases'].transform('sum')
recovered_df = recovered_df.drop_duplicates(subset=['Country/Region', 'Date', 'Cases']).reset_index(drop=True)
recovered_df.tail()
| Country/Region | Date | Cases | |
|---|---|---|---|
| 229738 | West Bank and Gaza | 3/9/23 | 0 |
| 229739 | Winter Olympics 2022 | 3/9/23 | 0 |
| 229740 | Yemen | 3/9/23 | 0 |
| 229741 | Zambia | 3/9/23 | 0 |
| 229742 | Zimbabwe | 3/9/23 | 0 |
Now all the dataframes have the same number of rows (229743). For each dataframe, the column Cases represents the accumulated number of cases for a specific country. Let us create a column for the daily new cases. For doing so, we will need to order the rows by the Date field. Therefore we have to change its type to datetime.
confirmed_df['Date'] = pd.to_datetime(confirmed_df['Date'])
death_df['Date'] = pd.to_datetime(death_df['Date'])
recovered_df['Date'] = pd.to_datetime(recovered_df['Date'])
confirmed_df = confirmed_df.sort_values(['Country/Region', 'Date']).reset_index(drop=True)
death_df = death_df.sort_values(['Country/Region', 'Date']).reset_index(drop=True)
recovered_df = recovered_df.sort_values(['Country/Region', 'Date']).reset_index(drop=True)
confirmed_df['New_Cases'] = confirmed_df.groupby('Country/Region')['Cases'].transform('diff')
confirmed_df.head()
| Country/Region | Date | Cases | New_Cases | |
|---|---|---|---|---|
| 0 | Afghanistan | 2020-01-22 | 0 | NaN |
| 1 | Afghanistan | 2020-01-23 | 0 | 0.0 |
| 2 | Afghanistan | 2020-01-24 | 0 | 0.0 |
| 3 | Afghanistan | 2020-01-25 | 0 | 0.0 |
| 4 | Afghanistan | 2020-01-26 | 0 | 0.0 |
death_df['New_Cases'] = death_df.groupby('Country/Region')['Cases'].transform('diff')
death_df.head()
| Country/Region | Date | Cases | New_Cases | |
|---|---|---|---|---|
| 0 | Afghanistan | 2020-01-22 | 0 | NaN |
| 1 | Afghanistan | 2020-01-23 | 0 | 0.0 |
| 2 | Afghanistan | 2020-01-24 | 0 | 0.0 |
| 3 | Afghanistan | 2020-01-25 | 0 | 0.0 |
| 4 | Afghanistan | 2020-01-26 | 0 | 0.0 |
recovered_df['New_Cases'] = recovered_df.groupby('Country/Region')['Cases'].transform('diff')
recovered_df.head()
| Country/Region | Date | Cases | New_Cases | |
|---|---|---|---|---|
| 0 | Afghanistan | 2020-01-22 | 0 | NaN |
| 1 | Afghanistan | 2020-01-23 | 0 | 0.0 |
| 2 | Afghanistan | 2020-01-24 | 0 | 0.0 |
| 3 | Afghanistan | 2020-01-25 | 0 | 0.0 |
| 4 | Afghanistan | 2020-01-26 | 0 | 0.0 |
Now, for each dataframe, the column New_Cases will have a null value for every single country. In order to correct this, we have to replace the null values with its correspondent value in the Cases column.
confirmed_df['New_Cases'] = np.where(confirmed_df['New_Cases'].isna(), confirmed_df['Cases'], confirmed_df['New_Cases'])
confirmed_df.head()
| Country/Region | Date | Cases | New_Cases | |
|---|---|---|---|---|
| 0 | Afghanistan | 2020-01-22 | 0 | 0.0 |
| 1 | Afghanistan | 2020-01-23 | 0 | 0.0 |
| 2 | Afghanistan | 2020-01-24 | 0 | 0.0 |
| 3 | Afghanistan | 2020-01-25 | 0 | 0.0 |
| 4 | Afghanistan | 2020-01-26 | 0 | 0.0 |
death_df['New_Cases'] = np.where(death_df['New_Cases'].isna(), death_df['Cases'], death_df['New_Cases'])
death_df.head()
| Country/Region | Date | Cases | New_Cases | |
|---|---|---|---|---|
| 0 | Afghanistan | 2020-01-22 | 0 | 0.0 |
| 1 | Afghanistan | 2020-01-23 | 0 | 0.0 |
| 2 | Afghanistan | 2020-01-24 | 0 | 0.0 |
| 3 | Afghanistan | 2020-01-25 | 0 | 0.0 |
| 4 | Afghanistan | 2020-01-26 | 0 | 0.0 |
recovered_df['New_Cases'] = np.where(recovered_df['New_Cases'].isna(), recovered_df['Cases'], recovered_df['New_Cases'])
recovered_df.head()
| Country/Region | Date | Cases | New_Cases | |
|---|---|---|---|---|
| 0 | Afghanistan | 2020-01-22 | 0 | 0.0 |
| 1 | Afghanistan | 2020-01-23 | 0 | 0.0 |
| 2 | Afghanistan | 2020-01-24 | 0 | 0.0 |
| 3 | Afghanistan | 2020-01-25 | 0 | 0.0 |
| 4 | Afghanistan | 2020-01-26 | 0 | 0.0 |
To finish our data cleaning process, we are going to merge the three dataframes, change the Country/Region column name to Country, get the country codes, remove the non country cases and save the data in this tidy format in a csv file.
covid_df = confirmed_df.merge(death_df, on=['Country/Region', 'Date'], suffixes=['_Confirmed', '_Death'])
covid_df = covid_df.merge(recovered_df, on=['Country/Region', 'Date'])
covid_df = covid_df.rename(columns={'Country/Region':'Country', 'Cases':'Cases_Recovered', 'New_Cases':'New_Cases_Recovered'})
covid_df.tail()
| Country | Date | Cases_Confirmed | New_Cases_Confirmed | Cases_Death | New_Cases_Death | Cases_Recovered | New_Cases_Recovered | |
|---|---|---|---|---|---|---|---|---|
| 229738 | Zimbabwe | 2023-03-05 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 |
| 229739 | Zimbabwe | 2023-03-06 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 |
| 229740 | Zimbabwe | 2023-03-07 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 |
| 229741 | Zimbabwe | 2023-03-08 | 264276 | 149.0 | 5671 | 3.0 | 0 | 0.0 |
| 229742 | Zimbabwe | 2023-03-09 | 264276 | 0.0 | 5671 | 0.0 | 0 | 0.0 |
code_map = {'Brunei':'BRN',
'Burma':'MMR',
'Congo (Brazzaville)':'COG',
'Congo (Kinshasa)':'COD',
"Cote d'Ivoire":'CIV',
'Holy See':'VAT',
'Iran':'IRN',
'Korea, North':'PRK',
'Korea, South':'KOR',
'Kosovo':'XXK',
'Laos':'LAO',
'Micronesia':'FSM',
'Russia': 'RUS',
'Syria':'SYR',
'Taiwan*':'TWN',
'West Bank and Gaza':'PSE'}
def get_country_code(name):
try:
return pycountry.countries.lookup(name).alpha_3
except:
try:
return code_map[name]
except:
None
covid_df['Code'] = covid_df['Country'].apply(get_country_code)
covid_df.tail()
| Country | Date | Cases_Confirmed | New_Cases_Confirmed | Cases_Death | New_Cases_Death | Cases_Recovered | New_Cases_Recovered | Code | |
|---|---|---|---|---|---|---|---|---|---|
| 229738 | Zimbabwe | 2023-03-05 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 | ZWE |
| 229739 | Zimbabwe | 2023-03-06 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 | ZWE |
| 229740 | Zimbabwe | 2023-03-07 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 | ZWE |
| 229741 | Zimbabwe | 2023-03-08 | 264276 | 149.0 | 5671 | 3.0 | 0 | 0.0 | ZWE |
| 229742 | Zimbabwe | 2023-03-09 | 264276 | 0.0 | 5671 | 0.0 | 0 | 0.0 | ZWE |
covid_df = covid_df[~covid_df['Code'].isna()].reset_index(drop=True)
covid_df.to_csv('data/covid_tidy_data.csv', index=False, float_format='%.0f')
covid_df.tail()
| Country | Date | Cases_Confirmed | New_Cases_Confirmed | Cases_Death | New_Cases_Death | Cases_Recovered | New_Cases_Recovered | Code | |
|---|---|---|---|---|---|---|---|---|---|
| 225166 | Zimbabwe | 2023-03-05 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 | ZWE |
| 225167 | Zimbabwe | 2023-03-06 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 | ZWE |
| 225168 | Zimbabwe | 2023-03-07 | 264127 | 0.0 | 5668 | 0.0 | 0 | 0.0 | ZWE |
| 225169 | Zimbabwe | 2023-03-08 | 264276 | 149.0 | 5671 | 3.0 | 0 | 0.0 | ZWE |
| 225170 | Zimbabwe | 2023-03-09 | 264276 | 0.0 | 5671 | 0.0 | 0 | 0.0 | ZWE |
We ended up with 225171 rows.
In this part of the project, we are going to get some general information and plot several data charts which will later be used in the final report. Let us start by seeing the total number of confirmed, death and recovered cases around the world.
covid_df = pd.read_csv('data/covid_tidy_data.csv')
confirmed_total = covid_df.query('New_Cases_Confirmed > 0')['New_Cases_Confirmed'].sum()
death_total = covid_df.query('New_Cases_Death > 0')['New_Cases_Death'].sum()
recovered_total = covid_df.query('New_Cases_Recovered > 0')['New_Cases_Recovered'].sum()
WIDTH = 210
HEIGHT = 297
layout = go.Layout(
autosize=False,
width=(WIDTH-10)*7,
height=100,
)
fig = go.Figure(layout=layout)
fig.add_trace(go.Indicator(
mode = "number",
value = confirmed_total/10**6,
number = {"valueformat": ".2f", "suffix":'M', 'font':{'size':56}, 'font_color':'black'},
title = {"text": "Global confirmed cases", 'font_color':'black'},
domain = {'x': [0, 0.3], 'y': [0, 1]}))
fig.add_trace(go.Indicator(
mode = "number",
value = death_total/10**6,
number = {"valueformat": ".2f", "suffix":'M', 'font':{'size':56}, 'font_color':'black'},
title = {"text": "Global death cases", 'font_color':'black'},
domain = {'x': [0.33, 0.66], 'y': [0, 1]}))
fig.add_trace(go.Indicator(
mode = "number",
value = recovered_total/10**6,
number = {"valueformat": ".2f", "suffix":'M', 'font':{'size':56}, 'font_color':'black'},
title = {"text": "Global recovered cases", 'font_color':'black'},
domain = {'x': [0.7, 1], 'y': [0, 1]}))
fig.write_image("images/kpis.png")
fig.show()
Now let us create a world map and color the countries according the their number of cases.
country_covid_df = covid_df.groupby(["Country", "Code"])['New_Cases_Confirmed'].sum().reset_index()
fig = px.choropleth(country_covid_df,
locations="Code",
color="New_Cases_Confirmed",
hover_name="Country",
#animation_frame="Date",
title="Global Cases",
projection="natural earth",
color_continuous_scale = 'Peach',
range_color=[0, 100000000],
labels={'New_Cases_Confirmed':'Number of cases'}
)
fig.write_image("images/global_cases_map.png")
fig.show()
Let us see the daily evolution of confirmed, death, and recovered cases.
fig, ax = plt.subplots(figsize=(8,2), dpi=150)
sns.lineplot(data=covid_df.groupby('Date')['New_Cases_Confirmed'].sum().reset_index(),
x='Date', y='New_Cases_Confirmed', ax=ax)
ax.set_title("Confirmed Cases by Day")
ax.set_xticks(["2020-01-30", "2021-01-30", "2022-01-30","2023-01-30"])
ax.set_ylabel("Cases")
plt.savefig("images/Confirmed_lineplot.png")
plt.show()
fig, ax = plt.subplots(figsize=(8,2), dpi=150)
sns.lineplot(data=covid_df.query("New_Cases_Death>0").groupby('Date')['New_Cases_Death'].sum().reset_index(),
x='Date', y='New_Cases_Death', ax=ax, color="#AA5555")
ax.set_title("Death Cases by Day")
ax.set_xticks(["2020-01-30", "2021-01-30", "2022-01-30","2023-01-30"])
ax.set_ylabel("Cases")
plt.savefig("images/death_lineplot.png")
plt.show()
fig, ax = plt.subplots(figsize=(8,2), dpi=150)
sns.lineplot(data=covid_df.query("New_Cases_Recovered>0").groupby('Date')['New_Cases_Recovered'].sum().reset_index(),
x='Date', y='New_Cases_Recovered', ax=ax, color="#5555AA")
ax.set_title("Recovered Cases by Day")
ax.set_xticks(["2020-01-30", "2021-01-30", "2022-01-30"])
ax.set_ylabel("Cases")
plt.savefig("images/recovered_lineplot.png")
plt.show()
To finish this part of the project, let us get some table with the top five countries by confirmed cases, death cases and mortality.
country_covid_df = covid_df.groupby('Country')[['Cases_Confirmed', 'Cases_Death']].max().reset_index()
country_covid_df['Mortality'] = round(country_covid_df['Cases_Death']/country_covid_df['Cases_Confirmed'], 3)
fig, ax = plt.subplots(figsize=(4,2), dpi=150)
fig.patch.set_visible(False)
ax.axis('off')
ax.axis('tight')
ax.set_title("Top 5 Countries - Confirmed Cases")
ax.table(cellText=country_covid_df.sort_values('Cases_Confirmed', ascending=False).head().values, colLabels=country_covid_df.columns, loc='upper center')
fig.tight_layout()
plt.savefig("images/top5_confirmed.png")
plt.show()
fig, ax = plt.subplots(figsize=(4,2), dpi=150)
fig.patch.set_visible(False)
ax.axis('off')
ax.axis('tight')
ax.set_title("Top 5 Countries - Death Cases")
ax.table(cellText=country_covid_df.sort_values('Cases_Death', ascending=False).head().values, colLabels=country_covid_df.columns, loc='upper center')
fig.tight_layout()
plt.savefig("images/top5_death.png")
plt.show()
fig, ax = plt.subplots(figsize=(4,2), dpi=150)
fig.patch.set_visible(False)
ax.axis('off')
ax.axis('tight')
ax.set_title("Top 5 Countries - Mortality")
ax.table(cellText=country_covid_df.query("Mortality < 1").sort_values('Mortality', ascending=False).head().values, colLabels=country_covid_df.columns, loc='upper center')
fig.tight_layout()
plt.savefig("images/top5_mortality.png")
plt.show()
In this part of the project we are going to use the FPDF package to generate a report in a pdf file that can be scheduled and sent to key users. Let us start by creating the file and putting the report title. We are going to use an A4 sheet format.
pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", 'B', 28)
pdf.cell(0, 10, "Covid Analytics Report", align='C', ln=2)
pdf.set_font("Arial", '', 14)
pdf.cell(0, 10, "Mateus Melo", align='C')
Now let us fill the first page of the report.
pdf.image('images/kpis.png', x = 5, y = 40, w = WIDTH-10, h = 0, type = '', link = '')
pdf.image('images/global_cases_map.png', x = 5, y = 60, w = WIDTH-10, h = 0, type = '', link = '')
pdf.image('images/confirmed_lineplot.png', x = 5, y = 180, w = WIDTH-10, h = 0, type = '', link = '')
pdf.image('images/death_lineplot.png', x = 5, y = 240, w = WIDTH-10, h = 0, type = '', link = '')
Now let us fill the second page of the report and finish the report.
pdf.add_page()
pdf.image('images/recovered_lineplot.png', x = 5, y = 20, w = WIDTH-10, h = 0, type = '', link = '')
pdf.image('images/top5_confirmed.png', x = 30, y = 80, w = WIDTH-60, h = 0, type = '', link = '')
pdf.image('images/top5_death.png', x = 30, y = 140, w = WIDTH-60, h = 0, type = '', link = '')
pdf.image('images/top5_mortality.png', x = 30, y = 200, w = WIDTH-60, h = 0, type = '', link = '')
pdf.output("report.pdf", 'F')
''
In this project we have generated an analytics reported with covid-19 cases around the world data. The method presented here may not be as embellished or practical as most some of the most famous dataviz tools, but it pays off in the financial and flexibility aspects.